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In this paper a method of obtaining smooth analytical estimates of probability densities, radial 
distribution functions and potentials of mean force from sampled data in a statistically controlled 
fashion is presented. The approach is general and can be applied to any density of a single random 
variable. The method outlined here avoids the use of histograms, which require the specification 
of a physical parameter (bin size) and tend to give noisy results. The technique is an extension 
of the Berg-Harris method [B.A. Berg and R.C. Harris, Comp. Phys. Comm. 179, 443 (2008)], 
which is typically inaccurate for radial distribution functions and potentials of mean force due to 
a non-uniform Jacobian factor. In addition, the standard method often requires a large number of 
Fourier modes to represent radial distribution functions, which tends to lead to oscillatory fits. It 
is shown that the issues of poor sampling due to a Jacobian factor can be resolved using a biased 
resampling scheme, while the requirement of a large number of Fourier modes is mitigated through 
an automated piecewise construction approach. The method is demonstrated by analyzing the radial 
distribution functions in an energy-discretized water model. In addition, the fitting procedure is 
illustrated on three more applications for which the original Berg-Harris method is not suitable, 
namely, a random variable with a discontinuous probability density, a density with long tails, and 
the distribution of the first arrival times of a diffusing particle to a sphere, which has both long 
tails and short-time structure. In all cases, the resampled, piecewise analytical fit outperforms the 
histogram and the original Berg-Harris method. 

PACS numbers: 05.20.Jj, 82.20.Wt, 02.70.Rr, 02.50.-r 



I. INTRODUCTION 

In many situations the probability to find a particle 
at a certain distance from a given other particle is of 
interest. This probability is given by the radial distri- 
bution function. In addition to providing detailed infor- 
mation on the local structure of a system, one can often 
express thermodynamic quantities such as the pressure, 
energy and compressibility in terms of the radial distri- 
bution functions.^ Furthermore, the radial distribution 
function can be reformulated in terms of a potential of 
mean force, which is of great importance in multi-scale 
simulations that take the potential of mean forc e as input 
in Langevin or Brownian dynamics simulations.!^ 

Radial distribution functions are usually constructed 
in simulations by forming a histogram of sampled inter- 
particle distances.'^! However, histograms contain the bin 
size as a free parameter, and can be rather noisy for 
a poorly selected value of this parameter. For prob- 
ability densities, an alternative method recently pro- 
posed by Berg and Harris avoids histograms and yields 
a smooth analytical form for the probability density 
which describes the sample data statistically at least as 
well as the noisy histogram.!^ This method has already 
proved useful in the context of the computation of quan- 
tum free energy differences from non-equilibrium work 
distributions,'^ the determination of the density in Bose- 
Einstein condensates,!^ and in the distribution of a reac- 
tion coordinate in simulations of chemical reactions.!^ 

Having a similar method for potentials of mean force 



and radial distribution functions would have many ad- 
vantages. As in the Berg-Harris method, such an ap- 
proach would avoid the noise that accompanies the stan- 
dard histogram method, while no a priori choice of bin 
size is required. Furthermore, a smooth radial distribu- 
tion would give a better representation of the potential of 
mean force (which is related to the logarithm of the radial 
distribution function), and allows the function to be eval- 
uated at any point in the range over which it is defined. 
Finally, the expressions for the pressure, energy and com- 
pressibility in terms of the radial distribution functions 
involve integrals of the radial distribution function over 
r. Such integrals can be evaluated more accurately from 
an analytic form than from a histogram, whose accuracy 
is restricted by the bin size. 

The purpose of this paper is to develop an approach to 
obtain smooth radial distribution functions and poten- 
tials of mean force from sampled inter-particle distances. 
The resulting method turns out to be suitable not just 
for radial distribution functions and potentials of mean 
force, but for a large class of densities of single random 
variables. 

The paper is structured as follows. Sec. |lT] presents a 
model of water in which rigid molecules interact through 
a discretized potential. Construction of radial distribu- 
tion functions from data derived from this model will be 
used as a running test case. This section also contains 
some details on the generation of sampled data through 
simulation. The Berg-Harris method for smoothing prob- 
ability densities in a statistically controlled fashion, and 
the connection between probability distribution functions 
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and radial distribution functions, are briefly reviewed in 



Sec. Ill In Sec. IV simulation results are presented that 
show the shortcoming of the Berg-Harris method for ra- 
dial distribution functions. The extended method is de- 
veloped in Sec. [Vj with Sec. V A containing an explana- 
tion for the poor performance and its solution through 
statistical resampling. Sec. |VB| addresses an additional 
problem related to the particular shape of radial distribu- 
tion functions, which is solved by extending the method 
to use piecewise analytic functions. In Sec. |VI[ the gener- 
ality of the method is illustrated by presenting the results 
of applying the method to data drawn from three differ- 
ent probability densities which are problematic for the 
original Berg-Harris method. The paper concludes with 
a discussion in Sec. IVIII 



II. SYSTEM: A DISCRETE WATER MODEL 

In the development of the smooth approximation 
method below, a model of rigid water molecules sub- 
ject to a discretized interaction potential between the 
molecules will be used as a running test case for the con- 
struction of radial distribution functions. Since a water 
molecule consists of two kinds of atoms (oxygen and hy- 
drogen), there are three radial distribution functions in 
this system, gooi ffOH and (?hh, which turn out to have 
quite different character and therefore give a more strin- 
gent test of the smooth fitting methods than a single- 
atom model would. 

The relative distances between the atomic sites in 
molecules are fixed, making the molecules rigid bodies. 
Each state of each body i can therefore be described by 
its center-of-mass position r^, its orientation or attitude 
matrix that transforms coordinates from the lab frame 
to a body-fixed frame for molecule i, and the associated 
linear and angular momenta. Here the body-fixed frame 
is chosen so that the third row of the matrix A^ corre- 
sponds to the direction of the molecule's dipole fii, whose 
magnitude is fixed at a value /i. 

The interaction potential between a pair of molecules 
i and j is a discrete version of the soft sticky dipole 
potential 



v,j = u^''(ri-rj)+w''P(r.-rj, A,, Aj)-^w''P(r,-rj, A„ Aj), 

(1) 

where the Lennard- Jones, dipole, and sticky parts of the 
potential are, respectively, given by 
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where Oij and 



ij are the conventional spherical angles of 
the vector A^r, and 9ji and are those of A^r. Finally, 
the switching function s(r) is defined as 



sir) 



1 




for r < 

for < r < ru 
for r > ru 



(5) 



while s'(r) is given by the same form with primed pa- 
rameters r'j^ and r[j. 

Here, the so-called SSD/E reparameterization of the 
model was used, for which the parameter values are e = 
0.152 kcal/mol, cr = 3.035 A, /i = 2.42 D, vq = 3.90 
kcal/mol, and — 0.07715, while the cut-off parameters 
in the functions s and s' are taken to be — 2.4 A, 
ru = 3.8 A, = 2.75 A and r'^ = 3.35 

The discontinuous interaction potential in our system 
is obtained from the smooth potential by the controlled 
energy discretization method presented in Ref.ll2l In this 
discretization, a cut-off naturally arises, and therefore no 
reaction field was included. 

The natural simulation technique for systems with dis- 
cretized potentials is discontinuous molecular dynamics, 
or DMD.ISHISJ jjj DMD, the dynamics of the system is free 
(no forces or torques are present) in between interaction 
events at which linear and angular momenta change. By 
its very nature, the energy-discretization scheme used in 
DMD is symplectic, time-reversible and strictly conserves 
the total (discretized) energy. It has no fixed time-step, 
but moves from event to event. The event frequency, 
which sets the efficiency of the method to a large degree, 
is determined by the level of discretization of the poten- 
tial energy: the finer the discretization, the more events 
occur. One typically finds that a discretization of the 
order of ^fcsT already suffices for a reasonably accurate 
simulation of the smooth system.^i^l In the simulations of 
which the results are presented below, the discretization 
of the potential energy was set to jksT. 

The orientational dynamics of free rigid bodies is 
an i mport ant aspect in the calculation of interaction 
times. The solution of the equations of motion for the 
angular momenta and attitude matrix A depends on the 
symmetry of the body through the principal moments of 
inertia. Although any value of the principal moments can 
be utilized to sample configurations of a system of rigid 
molecules, the DMD simulations presented here made use 
of the exact solution of the e quations of motion for a free, 
asymmetric rigid rotor ,^21111 and hence were based on the 
exact dynamics of the system. 

At the time of a distance measurement in the simu- 
lation, the forces and torques are most likely zero, since 
these quantities are non-zero only at discrete time points. 
This makes alternative smoothing methods, such as the 
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weighted residual methoc^ and metho ds ba sed on equi- 
Ubrium identities using (smooth) forcesp'^ not appUca- 
ble here. Since only the inter-particle distances are avail- 
able as input for the determination of the analytically- 
fitted radial distribution functions, the comparison with 
histogram-based radial distribution functions is more eq- 
uitable. 



III. REVIEW 

A. Smoothing probability densities with the 
Berg-Harris method 

Consider a random variable r which has a probabil- 
ity density p{r). Suppose that one has a sample of n 
data points {r^} that are independently drawn from the 
density p{r). How can one estimate p(r) from the data 
points? One way is to bin the data points into a his- 
togram. Because histograms are quite sensitive to statis- 
tical noise, Berg and Harris developed the following al- 
ternative procedure to obtain an analytical estimate for 
the probability density from the data.^l 

In the Berg-Harris method, the data are first sorted 
such that Vi < r^+i. Using the sorted data, the empirical 
cumulative distribution function F is defined in the range 
[ri,r„] as 



F(r) 



for n <r < ri+i. 



(6) 



Although F becomes a better approximation to the true 
cumulative distribution F{r) — f p{r') dr' with increas- 
ing sample size n, it is a function with many steps for 
any finite value of n so that its derivative is not analytic 
but rather consists of delta functions. 

The next step consists of writing the function F as 
a sum of a linear term and a Fourier expansion. The 
expansion is truncated at the mth term: 

m 

F{r) « F„(r) = Fo{r) + ^ sin[j7rFo(r)], (7) 

i=i 

where the linear term is defined as 

r — ri 
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Furthermore, the Fourier coefficients dj in Eq. ^ are 
determined from 



- , [Fir) - Foir)] sin[j7rFo(r)] dr. (9) 

rn - ri 

When F is approximated by F„i, the probability density 
p is approximately 
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Because F — Fq is zero at the end points of the interval 
[^1 , Tn] , and the Fourier modes form a complete orthonor- 
mal basis of the space of such functions, the empirical 
cumulative distribution F and the associated probability 
density are reconstructed exactly by Eqs. ([t]) and ( 10 1 as 
m —7- oo. However, as mentioned above, this limit would 
yield a series of delta functions for the probability den- 
sity p = Poo ■ The aim is to truncate the series at a level 
m which is not too high that one is fitting the noise, but 
high enough to give a good smooth approximation to F 
and p. 

The final step of the procedure is therefore to find the 
appropriate number of Fourier terms m in a statistically 
controlled fashion. The value of m is determined here 
using the Kolmogorov-Smirnov (K-S) test.!^^ This test 
determines how likely it is that the difference between 
the empirical cumulative distribution function F and its 
analytical approximation Fm is due to noise.l^ The test 
takes the maximum variation Dm between F and Fm over 
the sampled points (£>„ = maxj^i...„ {F^iri) - F(rj)|), 
and returns a probability Q„i — QiDm) that the differ- 
ence between the two cumulative distribution functions 
is due to chance. The functional dependence of Q on D 
is known to a good (asymptotic) approximation.l^II 

A small value of Qm indicates that the difference be- 
tween the cumulative distribution functions is statisti- 
cally significant, i.e. the quality of the expansion Fm is 
insufficient to represent the data. One therefore carries 
out a process of progressively increasing the number of 
Fourier modes m and evaluating Fm as well as Qm until 
the value of Q,„ is larger than some convergence thresh- 
old Qcut- A reasonable value for this convergence value 
is Qcut = 0.6. 

In Ref. ^ errors were estimated using the jackknife 
algorithm ,221 and the same method will be used here. 
However, since the inter-particle distance data naturally 
come in blocks, each corresponding to a single configura- 
tion, not all data points are independent. To account for 
this, we use a block-version of the jackknife method, in 
which a single block of data is omitted in each jackknife 
sample 



Prom probability densities to potentials of mean 
force 



To be able to apply the above smoothing method to ra- 
dial distributions, for which histograms are presently the 
method of choice, one has to make the connection be- 
tween a probability density on the one hand, and the ra- 
dial distribution function and potentials of mean force on 
the other hand. This connection will be briefly reviewed 
here because some of the details are needed below. 

Consider a system of a single type of particle, in which 
the number of particles is N and the volume of the sys- 
tem is V. The radial distribution function, denoted by 
gir), is the density of particles at a distance r away from 
a chosen first particle relative to the mean density N/V. 
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FIG. 1: Performance of the straightforward application of the Berg-Harris smoothing methods applied to the radial distribution 
functions goo (a), goa (b), and gnu (c) of the discretized water model. For comparison, the results from the histogram method 
are also shown (with error bars representing 95% confidence intervals at selected points) . Here the range over which the radial 
distribution functions are analyzed is defined to be [0, L/2], with L/2 = 12.4 A 



The true density at a distance r from a first particle is 
thus {N/V)g{r). The mean number of particles at a dis- 
tance between r and dr from a given particle is then 
n{r) — {N/V)g{r)4:nr'^dr. The Jacobian factor Airr^, 
which is, of course, the surface area of a sphere with ra- 
dius r, will play an important role below. The probability 
of a particle being at a distance r is equal to the number 
of particles with this r, divided by the total number of 
particles, N, i.e. p{r)dr ~ n{r)/N — g{r)4:7Tr'^dr /V, so 
one has the relation 



V 
47rr^ 



(11) 



Systems with different types of particles give rise to 
different radial distribution functions gij{r), where i and 
j label the kinds of particles. A similar argument to that 
above leads to the relation, 



V 

47J-J-2 



(12) 



where Pij(r) is the probability density of distances be- 
tween a particle j and a particle i. Once gij{r) is known, 
the potential of mean force <I>.ij is found frorrPl 

$,,(r) = -fcBTln.g,j(r), (13) 

where T is the temperature of the system and fc^ is Boltz- 
mann's constant. 

A further consideration for the applicability of the 
Berg-Harris method to radial distribution functions is 
whether the K-S test may be used at all. The K-S test 
assumes that the samples are independently drawn. Cor- 
relation between the samples may result in a bias in the 
radial distribution functions. Since nearby particles in an 
instantaneous configuration of the system are correlated, 
this is potentially an issue. If, however, the system is suf- 
ficiently large, only a small fraction of the samples in a 
single configuration will be spatially correlated, making 
the K-S test applicable to a very good approximation. 
Furthermore, if one ensures that the configurations arc 
taken from the simulation at sufficiently large time inter- 
vals, time correlations do not pose a problem either. 



IV. POOR PERFORMANCE OF THE 
STRAIGHTFORWARD SMOOTHING APPLIED 
TO RADIAL DISTRIBUTION FUNCTIONS 

Using the Berg-Harris method to get a smooth prob- 
ability densities p{r) from a sample of inter-particle dis- 
tances, one expects to obtain a good, smooth fit to g{r) 
by using Eq. (12). To test this expectation, the sys- 



tem of pure water in which rigid water molecules inter- 
act via a discretized potential energy derived from the 
soft sticky dipole model described in Sec. |Tl] was simu- 
lated. For all simulation results presented in this section, 
the parameters of the water model were as follows. The 
temperature is set at T — 298 K, the number of parti- 
cles is = 512, the cubic simulation box has sides of 
length L = 24.8A, so that the density is 1.0 kg/1. The 
principal moments of inertia of a rigid water molecule 
are 4 = 0.0337365 mj^.oA^ ly = 0.0635040 mH2oA^ 
and Iz — 0.0972405 77iH2oA^, where mnaO is the molec- 
ular mass of water. After equilibration, the simulations 
were run for 8 picoseconds, in which the inter-particle dis- 
tances were sampled every 2 picoseconds (long enough for 
the system to decorrelate), for a total of 4 configurations. 

From these data, the radial distributions goo, 5oh 
and guH were determined in the simulations through his- 
tograms and by following the smoothing procedure of 
Sec. |III A[ using a value for Qcut of 0.6. The results 
for the three radial distribution functions are shown in 
Fig. [T] Clearly, the two methods do not agree very well 
at short inter-particle separations, despite the fact that 
the smooth distance distributions are statistically good 
descriptions of the distance data according to the K-S 
test. In particular, note that the peak heights defining 
the first solvation shell are not well-described. 



V. EXTENDING THE BERG-HARRIS METHOD 

Given these unsatisfactory results, the histogram 
method would be preferred over the straightforward ap- 
plication of the Berg-Harris method. But there is a 
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way to fix the smoothing method to the extent that 
the smoothing method becomes preferable over the his- 
togram method. To understand how to fix the problem, 
one first needs to understand its underlying causes. 



A. Over-represented large distances 

1. Problematic Jacobian 

The convergence criterio n of the smooth approxima- 
tion relies on the K-S test.l^E^ This test is based only 
on the maximum variation D„i between F and F„i and 
is more sensitive to typical data points than to outliers. 
Although this may appear to be a contradiction at first 
glance, it is important that the K-S test depends on the 
maximum deviation in the cumulative distribution func- 
tion, rather than on the deviation of the random variable 
from the mean. To see that outliers do not constitute 
very deviant points in the cumulative distribution, sup- 
pose that, by chance, an outlier Xout from the far left tail 
of a distribution is found in a sample of size n. This would 
yield an increase of magnitude of 1/n for the empirical 
cumulative distribution F at Xout, while the cumulative 
distribution F{x) is practically zero since x is in the re- 
gion where the distribution is very small. If F^ is not 
too bad an approximation to F, F — F„i will also be of 
order 1/n at Xout- However, the deviation between F^ 
and F at other points x depends more on the goodness 
of fit than on the sample size, at least for points x in re- 
gions that contain a sizable fraction of the samples: these 
are the 'typical points'. For these points, there is little 
sample size dependence, so that to first order in n, one 
has F ~ F^ ^ 0{n°). Since 0{n-^) < 0{n°) for large 
enough sample sizes, the outlier Xout will not be seen as 
the most deviant point in the K-S test, but rather, some 
typical value x will have the most deviant cumulative 
distribution.l^Sl 

The above argument holds when the quality of the fit 
in the typical region is poor. However, as the number of 
Fourier modes used to fit F is increased, the quality of 
the fit of the cumulative distribution F improves in the 
typical region, and the maximum deviation shifts to less 
probable values. Because the probability density func- 
tion is small in the tails, this maximum deviation of the 
distribution is not that large and the convergence crite- 
rion (the K-S test) is met, resulting in a good fit in the 
typical region with a poor fit in the tails. 

The Berg-Harris method of Sec. |III A| therefore works 
well for probability densities without "long tails," since 
typical values of the variable will be the values of inter- 
est. However, for radial distribution functions, the focus 
on typical values is the origin of the difficulties getting 
the straightforward Berg-Harris method to work for ra- 
dial distribution functions (cf. Fig. [T]) . Typical values of 
r are of no interest in radial distribution functions. In 
a homogeneous system such as a fluid, the typical dis- 
tances between any two particles is on the order of half 



the system size, L/2. The length scales of interest in the 
radial distribution functions to describe local structure 
are typically much smaller than that. 

To make this point clearer, consider a dilute gas of 
hard spheres with diameter a. In the limit of infinite 
dilution, the radial distribution function g(r) is zero for 
r < a and unity otherwise. The probability density p{r) 
of inter-particle distances is then [cf. Eq. ( 11 )] 



Pdi\ntc(r) 



for 
Anr'^/V for 



r < a 
r > a. 



(14) 



Because of the Jacobian factor of 47rr^, the most likely 
values of r are of the order of the largest possible r, which 
is L/2. This means that in a sample of inter-particle dis- 
tances in the dilute hard sphere system, the large dis- 
tance samples (r^ = 0{L/2)) are much more abundant 
than the small distance samples (r^ = ©(cr)). Thus, when 
approximating p(r) by a smooth function using the K-S 
test, the long-distance part will be fitted very well, but 
the short-distance behavior (r = 0{a)) will not, because 
the test used is sensitive to the typical values of r, which 
are 0{L/2). 

The conclusion that large distance values overwhelm 
the smaller ones in which one is interested is valid beyond 
the dilute hard sphere case, since the Jacobian factor in 
Eq. ( 14 1 acts on long length scales and is therefore also 



present in systems of higher density with non-negligible 
interactions. 

A crude attempt at solving the large distance problem 
is to impose a cut-off r^ut on the allowed values of r in the 
sample of inter-particle distances. This cut-off should be 
of the order of the distance over which g{r) differs from 
one. But to determine that distance one has to mea- 
sure g{r). Thus, one would have to guess a value of the 
cut-off distance and adjust it until g(r) approaches one 
within a certain accuracy for r < Vcut- Unfortunately, 
even when the cut-off is chosen in this way, the results, 
although better than those in Fig. [TJ are still not very 
impressive, as Fig. |2] for the oxygen-oxygen radial dis- 
tribution function shows with rcut set to 7.44 A(using 
again Qcut = 0.6). One sees a lot of spurious oscilla- 
tions in the smooth radial distribution functions, which 
are due to the high number of Fourier modes needed to 
fit the radial distribution function (m = 18 in this case). 
Furthermore, these oscillations do not even agree within 
the 95% confidence interval with the histogram result, 
deviating especially for small distances r. T he e xplana- 
tion is that the Jacobian factor 47rr^ in Eq. (12 1 is still 



present within the restricted sample r < Tgut: and causes 
the peaks at larger r values to be fitted better than the 
peaks at smaller r values. 



2. Overcoming over-representation through resampling 



To remove the troublesome Jacobian factor in Eqs. ( 11 ) 



and ( 12 ) altogether, one can resample the inter-particle 
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FIG. 2: Result for goo of using a cut-off in the straightforward 
Berg-Harris smoothing, i.e., with the distance data restricted 
to r < Tcut ~ 7.44 A. For comparison, the results from the his- 
togram method are also shown (with error bars representing 
95% confidence intervals shown at selected points). 



distances found in the simulation. The idea is to draw 
from the original sample of distances, {vi} such that 
smaller values of r are more likely than larger values by 
introducing a relative bias weight w{ri) for each distance 
Ti in the sample to construct a new, resampled set {f^j.^^S! 
Given that the probability densities of the original sample 
points is p{r) = Airr"^ g{r) /V [cf. Eq. (11)], and provided 
that subsequent resampled data points are chosen inde- 
pendently with weight w{ri), the probability density of 
the resampled points is given by 



FIG. 3: The resampled and the standard Berg-Harris cumula- 
tive distributions associated with the radial distribution func- 
tion goo of the discrete water model. 



with 



47r ■ 
Vn 



(20) 



In detail, the resampling leading to data with a prob- 
ability density p can be performed as follows: Given the 
original set {r,} one determines the weight for each data 
point as 



= w{ri) 



1 



P{r) = k ^w{r)p{r) = z \^wir)g{r), (15) One also determines the maximum weight. 



where k ~ J w(r)p(r) dr and z = kV/{AT:) are constants. 
Note that z may be determined from 



Wmax = niax 

i—l...n 



V 
47r 



p{r)w{r)dr 



V 
47r 



(16) 



to convert the weights into probabilities 



which may be approximated by the average of the weight 
over the samples: 



1 



4=1 



w{ri). 



(17) 



In the application to the radial distribution functions be- 
low, we also obtained {w) from a numerical integration 
using the biased analytic approximation for p, the result 
of which differed from that given by the simpler expres- 
sion in Eq. (17) by less than 0.1%. 



The simplest way to counter the Jacobian factor 47rr^ 
in Eq. ( 15 ) is to choose a weight 



(21) 



(22) 



(23) 



which, by construction, lie between and 1. Next, one 
takes one of the original sample points at random (with 
equal weight), draws a random number ^ uniformly from 
the interval [0,1] and if ^ < Pi, one adds to the re- 
sampled data set {fi\. The procedure is repeated until 
enough resampled points have been gathered. There is 
some choice into what number of resampled points is to 
be taken. In our implementation, the number of resam- 
pled points is chosen to be the same as the number of 
points in the original sample. 

Note that Eq. (19 1 shows that the resampled prob- 



1 



which gives 



w{r) = 



'l{r) = zp{r), 



(18) 



(19) 



ability density p and the radial distribution function g 
are proportional to one another. Since g approaches 
one for large r, the large values of r are no longer over- 
represented in p. 

For the resampled data {fj}, one can use the Berg- 
Harris smoothing method of Sec.|III A to obtain a smooth 
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FIG. 4: The radial distribution function goo of the dis- 
crete water model, using the resampled smoothing method 
described. For comparison, the results from the histogram 
method are also shown (with error bars representing 95% con- 
fidence intervals at selected points). 



approximation to p. Fig.[3]shows the effect of resampling 
on the cumulative distribution functions associated with 
goo of the water model, using Qcut = 0.6. It is evident 
that the small-r tail of the unweighted (Berg-Harris) cu- 
mulative distribution is much enhanced in the resampled 
cumulative distribution, and therefore more relevant to 
the K-S test. Note that we have plotted only the fits, 
and not the empirical cumulative distributions, because 
the fits are hard to distinguish from the empirical cumu- 
lative distributions. The differences are easier to see in 
the associated radial distributions, which are plotted in 
Fig. |4] In contrast to Fig. [2] the smooth approximation 
now follows the result of the histogram within error bars 
for the whole range of r values. But like in Fig. [2] the 
supposedly smooth goo{f) exhibits fast oscillatory be- 
havior within those error bars. This oscillatory behavior 
will be addressed next. 




FIG. 5: Illustration of a piecewise approximation _Fo to the 
cumulative distribution. The function Fq (the solid line) is 
linear between the split points ai , ... as (indicated by the 
dash vertical lines). The function increases by an amount 
between split points and a,i+i. The values of a^j and 
ff_i used in this sketch roughly correspond to the piecewise 
analytic fit for the oxygen-oxygen radial distribution function, 
which requires k = A intervals to achieve convergence. 



It is conceivable that there are expansions other than 
the Fourier expansion that are more suitable for describ- 
ing a sharply increasing function followed by a more mod- 
erate behavior. However, it is not clear at present what 
this specialized expansion should be. Therefore, a sys- 
tematic and generic way will now be presented which 
avoids high-frequency modes in parts of g{r) that do not 
require them. This will also make the method more gen- 
erally applicable to discontinuous and other hard-to-fit 
densities. 

We note that in the following section, the resampling 
explained in the previous section is supposed to have been 
performed on the data already, and that the resampled 
data have been sorted (f^ < ri+i). For notational conve- 
nience, the tildes on r, p and F are omitted below. 



B. Hard-to-fit distribution functions 

1. Unphysical oscillations 

It is hard to reconcile the aim of having a smooth ap- 
proximation to the radial distribution functions and the 
large number of Fourier modes needed to approximate 
the shape oi g{r), which starts out as a very flat function 
at small r, then increases sharply, reaches a maximum not 
far beyond this sharp rise, and then decays on a larger 
scale to 1 in an oscillatory fashion. The peculiar shape 
of g{r) leads to the oscillatory behavior seen in Figures[2] 
and m Even when the oscillations lie within error bars 
of the histogram results, they reintroduce a remnant of 
the noise similar in magnitude to that present in the his- 
tograms that one is trying to avoid. 



2. Resolving the oscillation problem using a piecewise 
approach 

The decomposition of F in Eq. ^ can be adjusted 
to incorporate hard-to-fit radial distributions by allowing 
the Fourier decomposition to be different on sub-intervals 
within the total interval [ri,r„]. Let the full interval be 
divided into k sub- intervals [01,02], [02,03], [03,04], . . . , 
[ofc,afe+i], where oi — ri and a^+i = r„. The intervals 
will be labelled by a Greek index, which can run from 1 to 
k. How to choose the points o^ (where fj, — 2 . . . k, with 
oi = ri fixed) will be discussed later. Different intervals 
can have different numbers of samples that fall within 
its range. The fraction of the samples that fall within a 
given interval /i is denoted by 

On each sub-interval, F{r) is approximated by a linear 
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FIG. 6: The radial distribution function goo of the discrete 
water model, using the first piecewise resampled smoothing 
method described in Sec. |VB 2[ For comparison, the results 
from the histogram method are also shown (error bars were 
omitted to make the comparison clearer). 



part and a truncated Fourier transform of the remainder. 
Analogously to Eq. ([8|, the linear part in interval /i is 
given by 



(24) 



Note that Fq^ ranges from to 1 in the interval The 
piecewise linear approximation to F for r in interval /i is 
then 



Fa{r)^F[a^) + f^Fo^{r). 
where it should be noted that 



(25) 



(26) 



An example of such Ffj{r) is shown in Fig. [s] based on 
data from the oxygen-oxygen radial distribution function 
in the water model (as explained below). Approximations 
to F beyond Fq are found by adding Fourier modes for 
each interval, 

k rrif^ 

F{r)^Fo{r) + J2 /mXm (^) E '^m. Mj^Fo,. (0] • (27) 



Here, x^i is the characteristic function on interval /z, i.e., 

^f"^ ' \ otherwise, 
and c?^j is given by 



a^+i — a^i J a 



[F^{r)-Fo^{r)]sm[j7rFoM 

(29) 



F^{r) = f-'[Fir)-Fia^)]. 



(30) 



The function F^{r) is the conditional empirical cumu- 
lative probability distribution function for data points 
within interval /i. Eq. ( 27 1 represents a piecewise analytic 



approximation to the cumulative distribution. Since the 
Fourier modes form a complete orthonormal basis, the 
approximation becomes exact as — oo for all fi. As 
before, however, the should not become too large 
to avoid spurious oscillations, which only amounts to fit- 
ting the noise. One therefore defines a maximum number 
iTimax of Fouricr modes allowed in each sub-interval and 
sub-divides intervals if the maximum number of modes is 
not sufficient for convergence, as determined by the K-S 
test. 

The division of the original interval is chosen dynami- 
cally as follows: 

1. Start with one interval; 

2. Increase m until Q > Qcut; 

3. If m exceeds TOmax, find the most deviant point a 
according to the K-S test; 

4. Split the interval in two at the point a; 

5. Repeat for the conditional distribution for each in- 
terval left and right of a. 

Because the procedure is recursive, in each step the orig- 
inal one-interval procedure is used, which one exception: 
the allowable value of Q may be set to a lower value in a 
sub-interval, since there are fewer points in the interval 
and therefore the interval carries less statistical weight. 
We have not found a unique, statistically controlled way 
to adjust the Qcut for sub-intervals, but found heuristi- 
cally that scaling the Qcut for interval by avoids fit- 
ting the noise and leads to a satisfactory overall Q value. 

The recursive, multi-interval procedure is found to 
greatly speed up the convergence of the approximation 
scheme, leading to much lower m^, and thus fewer oscil- 
lations. It should be stressed that splitting at the most 
deviant point a in the K-S test is found to be essential 
here: choosing a different splitting point does not im- 
prove the convergence because the most deviant point 
is then still difficult to fit and the Q value for the sub- 
interval containing the most deviant point remains the 
same. 



Once the approximation in Eq. (27) has been obtained. 



the probability density is given by 



ttEc^p 



acos[j7rFo,,{r)]} (31) 



where /i is such that r S [a^,a^+i]. 

The results of applying the piecewise procedure to the 
5oo of the water model are shown in Fig. [6] (remember- 
ing that g = zp), with mmax = 14 and the initial Qcut set 
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FIG. 7: The resampled, piecewise-analytic method apphed to the radial distribution functions goo (panel a), gon (panel b) 
and guH (panel c) of the discretized water model, using the patched piecewise resampled smoothing method. For comparison, 
the results from the histogram method are also shown, while error bars were omitted for clarity. 



to 0.6. While the method now works better than without 
the piecewise approach, it has one drawback: the deriva- 
tive of the approximate cumulative distribution function, 
which gives g{r), need not be a smooth function across 
the different intervals. As a consequence, the result in 
Fig. [6] shows artificial discontinuities. These discontinu- 
ities at the boundaries of the intervals fall within the 95% 
confidence intervals (not shown in Fig. [6]). 



3. Dealing with spurious discontinuities 

Provided the underlying probability distribution of the 
samples is continuous, the spurious discontinuities that 
are present in the piecewise-analytic fit will get smaller 
as the number of sample points increases. Nevertheless, 
for cases with poor statistics, one would like to have an 
approach that gives a continuous piecewise analytic fit to 
the distribution. In fact, for some applications, having a 
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FIG. 8: Comparison between the radial distribution function 
goo of the discrete water model obtained using the resam- 
pled piece-wise analytic method, and using a spline fit to the 
histogram results (error bars were omitted to make the com- 
parison clearer). 



continuous radial distribution function is essential, such 
as for Brownian simulations that take the potential of 
mean force as input. Even though the discontinuities are 
within the statistical noise, they would lead to sudden, 
unphysical changes in energy in such applications. 

There are many ways to get a continuous curve out 
of the discontinuous one, but it should be remembered 
that one is working within the statistical noise. We can 
therefore choose any method as long as the "patch" is 
still statistically reasonable. To determine the statistical 
suitability, one can once again use the K-S test on the 
patched fit. 

We have chosen the following simple procedure: We 
apply a quadratic patch function around each disconti- 
nuity at a^, 



if r e [a^ - Cp 
if r e au,a„ - 



(32) 

The width of the patch is adjustable, while the prefac- 
tor 6p follows from the requirement that the cumulative 
distribution F + y^^,, Ff'^^'^^ has a continuous derivative 
a.t r ~ a^. Provided the different patches do not overlap, 
this leads to a value of 



Ap 

4:C' 



(33) 



where Ap is the height of the jump in the unpatched 
distribution function. 

Initially, each width is set equal to half the mini- 
mum interval size on either side of the corresponding split 
point to avoid overlap between the patches. The Q 
value of the patched distribution is then determined, and 
if it is not smaller than the Q value for the unpatched 
distribution, the patch is accepted. Otherwise, the width 
of the patch is reduced by a factor of two, until the Q 
value is acceptable.'^ It will be demonstrated below that 
this solves the problem of spurious oscillations and dis- 
continuities. 
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FIG. 9: The resampled, piecewise-analytic method apphed to the potentials of mean force $00 (panel a), $oh (panel b) and 
$HH (panel c) of the discretized water model. For comparison, the results from the histogram method are also shown. 



Final results 



VI. FURTHER APPLICATIONS 



Figs. [7| show the result of the resampled, piecewise pro- 
cedure for the three radial distribution functions of the 
water model, goo, 5oh and 5hh, with mmax — 14 and 
Qcut = 0.6. It is clear that the piecewise analytic fit 
is now smoother than the histogram, hardly shows any 
oscillations and is continuous. Furthermore, while error 
bars were omitted in Fig. [7] for clarity, it was found that 
the histogram and the patched, piecewise analytic results 
are in mutual agreement within the 95% confidence in- 
tervals. 

It may be argued that plotting the histograms using 
bars is an unfair way of representing the histogram re- 
sults. One often takes the histograms and applies a cubic 
spline to the results, which makes the graph seem 
smoother. There is of course no a priori reason why the 
splined histogram should represent the radial distribution 
function better. For this reason, we have shown only 'un- 
splined' histograms so far. The comparison between the 
histograms and smooth approximations in these curves 
should really only be done at the mid-point of the his- 
togram bins. Now that the piecewise analytic smoothing 
approach is fully developed, however, it is interesting to 
see how it compares to a smoothed spline fit of the his- 
togram results. Figure |8] shows a plot of these two types 
of smooth results for goo- The bin size for the histograms 
has been chosen such that the first peak of the radial dis- 
tribution is well resolved. One sees that the cubic spline 
fit does a reasonable job for the first peak, but that in 
the second peak in the radial distribution function the 
spline fit is still noisy, while the statistically controlled 
piecewise analytic result is not. Thus, the splines cannot 
fix the roughness of the histogram method, at least not 
when bin sizes are uniform. 

To show how the piecewise analytic method performs 
for potentials of mean force, the smooth potentials of 
mean force between the different species corresponding to 
the radial distributions in Figs, [t] [cf. Eq. ( 13 )] , have been 
plotted in Figs. [9] The results from the piecewise analytic 
method are considerably smoother than the histogram 
results, but still exhibit some roughness since they were 
based only on four configurations. 



The piecewise approximation method is not specific to 
radial distribution functions, but can help to fit any prob- 
ability density that is hard to fit with a truncated Fourier 
series. Below, we will give several examples of such den- 
sities and show the advantages of using the piecewise an- 
alytical approximation method. 



A. A discontinuous density 

Consider a random variable r with a distribution p 
jiven by 



p{r) 



if r < 
if < r < I 

if r > i 



(34) 
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FIG. 10: The piecewise-analytic method applied to 50, 000 
samples drawn from the discontinuous distribution of 
Sec. |VI A[ Also shown for comparison are the results from 
the histogram method and the Berg-Harris method. For clar- 
ity, error-bars on the smoothing methods and the histogram 
have been omitted. 
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FIG. 11: The piecewise-analytic method applied to 50,000 
samples drawn from the Cauchy distribution (cf. Sec. VI B I. 
For comparison, the results from the histogram method and 
the exact result are also shown. For clarity, error-bars have 
been omitted. Because the piece- wise analytic result is hard to 
distinguish from the exact distribution, the difference between 
the two is plotted in Fig. [12] 



FIG. 12: The error in the piecewise-analytic method ap- 
plied to 50, 000 samples drawn from the Cauchy distribution 
(cf. Sec. VI B I as compared to the exact form in Eq. ( 35 1 . The 
error is here defined as the deviation from the exact result. 
For comparison, the errors in the Berg-Harris method are also 
shown. 



Note that vi^ithin the domain of this function [0, |], there 
is a discontinuity at r — ^. Discontinuities are very 
poorly represented by truncated Fourier series.^^^ 

From the above distribution, 50,000 samples were 
dravi^n and used as input to the piecewise analytic approx- 
imation (with TOinax = 14 and the initial Qcut = 0.6), as 
well as to the one-interval analytic approximation (with 
the same Qcut — 0.6 and unrestricted Wmax) and the 
histogram method. The results are shown in Fig. |10[ 

One clearly sees the trouble that the Berg-Harris one- 
interval approximation has in capturing the discontinu- 
ity, while the histogram is very noisy. The piecewise an- 
alytic expansion, on the other hand, beautifully captures 
the whole distribution; the automated procedure divides 
the interval [0, |] in two at r = 1 and then needs zero 
Fourier modes to approximate the separate pieces. 



B. The standard Cauchy distribution 



Not plotted in Fig. 11 were the results of the original 



Berg-Harris method, which, surprisingly, are so similar to 
the piecewise approach that they would be hard to make 
out in the figure. The only statistically significant differ- 
ence between the two results is apparent in the height of 
the maximum, which the Berg-Harris fit slightly under- 
estimates. 

The success of both methods is illustrated further in 
Fig. [12] in which the errors in the piecewise and the one- 
interval results are compared; the piecewise analytic er- 
ror is overall slightly smaller than that of the Berg-Harris 
method, but not by much. This success seems to contra- 
dict Berg and Harris' warning against using the smooth- 
ing method for densities with long tails. It is possible the 
relatively high quality of the fit in the simple approach is 
due to the symmetric nature of the Cauchy density and 
its relative lack of structure. 

A more stringent test of the method applied to long- 
tailed distributions will be presented next. 



According to Berg and Harris,-- the original smoothing 
method has trouble with densities with long tails. It 
is therefore interesting to see if the piecewise approach 
helps for these kinds of densities as well. As an example, 
we consider the standard Cauchy distribution 



p{r) 



1 



7r(l 



(35) 



From this distribution, 50,000 samples were randomly 
drawn and used in the piece- wise approach (with rrimax = 
14 and the initial Qcut = 0.6). The results are contrasted 
with those of the histogram in Fig. [TT] The piecewise 
smooth approximation performs so well that it can hardly 
be distinguished from the Cauchy distribution. 



C. First arrival time distribution of a diffusing 
particle to a sphere 

Consider a particle undergoing diffusion, starting at a 
distance i?o from the origin, i.e. anywhere on a sphere 
of radius Rq. The distribution function p{r,t) of the 
diffusing particle satisfies dtp — DW^p, where D the 
self-diffusion constant. Investigating the time t required 
to arrive anywhere on the surface of a sphere of radius 
Rmin for the first time is a classic case of a first passage 
problem.'^S! Such problems have applications in the rate 
of molecules finding each other in a solution, and are a 
major determining factor of reaction rates in diffusion 
limited reactions. 
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FIG. 13: The piecewise-analytic method applied to 100, 000 
samples drawn from the first passage time distribution of 
Sec. |VI C| Also shown for comparison are the results from 
the histogram method and the Berg-Harris method, as well 
as the exact result. 



FIG. 14: The relative error in the piecewise-analytic method 
applied to 100, 000 samples drawn from the first passage time 
distribution of Sec. |VI C| compared to the relative error in 
the Berg-Harris method. The errors are defined here as the 
deviation from the exact result. 



It turns out that the first arrival times are distributed 
according tcP^ 



Pit) 



exp 



X 



(36) 



where x = Rq — R„ 



This distribution function has 



both short time structure and a long tail. 

From the distribution in Eq. (36 1, 100,000 samples 



were randomly drawn, with parameters set at D = 1, 
Rmin = 1 (these choices set the time and length units) 
and i?o = WRmin- The piece- wise approach was used to 
obtain an estimate of the probabihty distribution (with 
"T-max = 14 and Qcut = 0.6), as well as the one-interval 
analytic approximation (Qcut = 0.6 and unrestricted 
TOmax) and the histogram method. The results are com- 
pared in Figs. [13] and [14] 

While all methods work to some extent, the histogram 
method would have to be tailored to the function in ques- 
tion in order to be useful, with differently sized bins for 
different values of t. The Berg-Harris method works rea- 
sonably well without subdivisions, but exhibits fast os- 
cillations in the tails of the density, as becomes very ap- 
parent from Fig. [T4j The appearance of oscillations is 
not surprising when one considers that m ~ 390 Fourier 
modes were needed for the Berg-Harris fit! The piece- 
wise analytic result does not exhibit fast oscillations in 
the tail. Furthermore, the piecewise fit only required a 
total of 28 Fourier modes distributed over 8 intervals. Al- 
though the piecewise result does have some modulation 
within the error bars, we have checked that almost all of 
these disappear when one takes 10 times as many data 
points. At that level of statistics, the Berg-Harris results 
still have oscillations in the tails, and still require over 
350 Fourier modes. 

When the analytical form of p{t) is not known, such as 
for absorption in non-spherical geometries, but samples 



{ti} are available from numerical simulation, the piece- 
wise approach would give the best description of p{t): 
one that is less noisy than the histogram construction 
and with fewer oscillations than the one-interval Berg- 
Harris method. 



VII. CONCLUSIONS 

In this paper, a method to obtain smooth analyt- 
ical estimates of probability densities, radial distribu- 
tion functions and potentials of mean force in a statis- 
tically controlled fashion without using histograms was 
presented. This method only uses direct samples of data 
(distance samples in the case of radial distribution func- 
tions). Since this method is expected to be generally use- 
ful, we have made our implementation, coded in c and 
C++, available on the web.l^SI 

While the method is based on the Berg-Harris method, 
the statistical criterion used in that method is most sensi- 
tive to the most common samples, which for radial distri- 
bution functions are not the ones of physical interest. To 
make the method work for radial distribution functions, 
a weighted resampling of this data was required. Spu- 
rious oscillations, allowed by the statistical noise, were 
eliminated using a piecewise approach. In addition, one 
can optionally patch the piecewise-analytic form to avoid 
discontinuities within the errors, if desired. The resam- 
pled, piecewise smoothing method was demonstrated on 
data from event-driven DMD simulations of water, and 
proved to give a much smoother result than the histogram 
method. 

If the purpose of a simulation is just to get a good 
smooth result for the radial distribution functions of a 
water model, one could use the histogram method with 
longer simulation runs to reduce the statistical uncer- 
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FIG. 15: The distribution of the relative errors in the 
piecewise-analytic method apphed to 100, 000 samples drawn 
from the first passage time distribution of Sec. |V1 C| com- 
pared to the distribution of the relative error in the Berg- 
Harris method. The errors are defined here as the deviation 
from the exact result. The points with error bars are found 
by applying the piecewise analytic method to the deviations 
plotted in Fig. 
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(with errors from the jackknife procedure), 
while the drawn lines are fits to Gaussians with a zero mean. 



tainties to some predetermined limit. However, the his- 
togram would still be biased and known just at a lattice 
of points if a uniform bin size is used. To get a smoother 
and less biased curve from the histogram, one would also 
have to decrease the bin size by hand. The piecewise 
analytic method makes such tuning unnecessary. The 
method also make much longer runs unnecessary, since 
what appears to be very poor statistics for the histogram 
method turns out to be quite reasonable statistics for the 
piecewise analytic method. Remember that the same set 
of inter-atom distance are used in both methods. Appar- 
ently, there is much more statistics in the sample than 
the histogram is using (something that Berg and Harris 
also noticecP). 

For the test case considered here, the computational 
cost of doing longer runs is not that large, so the advan- 
tages of the piecewise analytic method may seem to be 
nice but not necessary. In other applications, however, 
such as in studies of the distribution of water molecules 
near a polymer or a bio-molecule (or near one of their 
polymer units), better statistical information is costly 
to obtain because such systems are not only computa- 
tionally more demanding, but there are far less samples 
available in a single configuration since only the water 
molecules near the polymer or the bio-molecule are in- 
volved. The simulation run times would have to be much 
larger to compensate for this poor statistics, if histograms 
are used. In such cases, the piecewise analytic method 
is expected to be advantageous since it appears able to 
use more of the information present in the inter-particle 
distance sample. 

The piecewise analytic method yields a smooth approx- 



imation to the probability density function, and the de- 
viations of the empirical distribution from this smooth 
curve are supposed to be due to statistical noise. If this is 
true and the methods are unbiased, then for large enough 
sample size n, one would expect the distribution of errors 
in the probability density functions to be Gaussian with 
a zero mean. To test whether this is the case, the prob- 
ability densities of the relative errors plotted in Fig. [M] 
were determined, both for the Berg-Harris and for the 
piecewise analytic method.'^ The results are shown in 
Fig. [15] Within statistical uncertainty, both distribu- 
tions were found to be unbiased. Furthermore, for the 
piecewise analytic results, the error densities are roughly 
Gaussian. The errors from the Berg-Harris method, on 
the other hand, seem to show deviations from Gaussian 
behavior. The non-Gaussian nature of the errors of the 
Berg-Harris method might be due to the fact that the 
sample noise is fitted too closely since a large number 
of Fourier modes are necessary, which means the dis- 
tribution of errors follows the distribution of the finite 
sample-size errors rather than being truly random. The 
Gaussian nature of the errors in the piecewise analytic 
method might be a confirmation that in the smooth ap- 
proximation, enough Fourier modes have be taken into 
account for the remainder to be due to a random statis- 
tical noise. 



One of the nice features of the piecewise analytic 
method is that one does not have to choose a bin size 
a priori, as one has to do in the histogram method. It is 
nonetheless true that within the smoothing method, one 
is to some extent free to choose the cut-off value Qcut 
of the 'quality' parameter Q and the maximum number 
TOmax of basis functions allowed in the expansion of each 
interval. Note that both Qcut and mmax a-re dimension- 
less numbers, and do not contain physical parameters, in 
contrast to the bin size parameter required in the his- 
togram approach. Setting Qcut too low may result in a 
bad fit to the data, while setting TOmax too large may 
result in noise fitting and artificial oscillations. We have 
found that Qcut ~ 0.6 and mmax ~ 14 are reasonable 
choices, and these values were used in all the applica- 
tions presented above. 



There is also some freedom in the choice of the set of 
basis functions, as long as they form a complete set. The 
Fourier basis used here is convenient and familiar, but 
others are possible. For instance, in Ref. [51 a Chebyshev 
expansion was used. This expansion was chosen over 
the Fourier expansion because it gave fewer oscillations. 
The piecewise approach present in this paper, however, 
resolves the oscillation problem as well, and is expected to 
be less sensitive to the choice of basis functions than the 
original Berg-Harris method. Furthermore, the method 
was shown to also be applicable to other potentials of 
mean force and other probability densities with a hard- 
to-fit character. 
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